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SYSTEM AiND METHOD FOR GENERATING Every year, companies compile large volumes of infor- 

PERFORMANCE MODELS OF COMPLEX matiorj in databases, thereby further straining the capabili- 

INFORMATION TECHNOLOGY SYSTEMS ties of traditional data analysis techniques. These increas- 
ingly growing databases contain valuable information on 

BACKGROUND OF THE INVENTION 5 many facets of the companies' business operations, includ- 

ing trend information which may only be gleaned by a 

TECHNICAL FIELD OF THE INVENTION critical analysis of key data interspersed across the database 

. , . (s). Unfortunately, because of the sheer volume and/or 

The present invention relates to complex information complexity of the available information, such trend infor- 

technology systems (IT) and, in particular, to continuity mation ^ lypical]y losI ^ it becomes unrecoverable by 

analysis techniques for discovering relations among com- 10 manual ^^0,, melbods or traditional information 

plex events occurring in such systems, and, more mana g em ent systems. The principles of data min ing, 1 

particularly, to techniques for improving the performance of noweV er, may be employed as a tool to discover hid den 

such IT systems through iterative system modeling. t rend information buried within the pile of total inform ation I 

BACKGROUND AND OBJECTS OF THE 15 a ™lL blc - ... 

INVENTION Such data mining techniques are being increasingly uti- 

lized in a number of diverse fields, including banking, 

With the exponential growth of the computer and the marketing, biomedical applications and other industries, 

computer industry, information technology (IT) systems insurance companies and banks have used data mining for 

have become increasingly complex and difficult to manage. ^ risk analysis, for example, using data mining methods in 

A typical IT system in even a small company may contain investigating its own claims databases for relations between 

dozens of computers, printers, servers, databases, etc., each client characteristics and corresponding claims. Insurance 

component in some way connected to the others across the companies have obvious interest in the characteristics of 

interlinkage. A simplified example of an interconnected IT meu - policy holders, particularly those exhibiting risky or 

system is shown in FIG. 1, described in more detail here- ^ otherwise inappropriate activities or behaviors adverse to the 

inafter. companies' interests, and with such analyses, are able to 

Although interconnected systems, such as the one shown determine risk-profiles and adjust premiums commensurate 

in FIG. 1, offer many advantages to the users, e.g., resource with the determined risk. 

sharing, as such systems grow and the number of component Data mining has also found great success in direct mar- 

interlinkages increase, the behavior of these complex sys- 30 keting strategies. Direct marketing firms arc able to deter- 

I terns becomes more difficult to predict. Further, system mine relationships between personal attributes, such as age, 

performance begins to lag or becomes inconsistent, even gender, locality, income, and the likelihood that a person will 

becoming chaotic in nature. The addition or removal of one respond to, for instance, a particular direct mailing. These 

component, even seemingly minor, could have dramatic relationships may then be used to direct mailing towards 

consequences on the performance of the whole system. Even 35 persons with the greatest probability of responding, thus 

an upgrade on one component could adversely affect a enhancing the companies' prospects and potential profits, 

distant, seemingly unrelated component. The system and Future mailings could be directed towards families fitting a 

method of the present invention is directed to techniques to particular response profile, a process which could be 

better pr edict the behavio r of complex IT systems, offering repeated indefinitely and behaviors noted. In this sense, the 

system administrators the opportunity to identify problem ^ data mining analysis learns from each repeated result, pre- 

areas such as performance bottlenecks and to correct them dieting the behavior of customers based on historical analy- 

prior to a system or component failure. sis of their behavior. 

Conventional approaches to system pe rformance rooni- In the same manner demonstrated hereinabove, data min - 1 

I toring are inadequate to easily divine the nature of a per- ing may also be employed in predicting the behavior of t he J 

1 form an ce problem in a complex IT system since any data 45 com ponents of a complex information technology (IT) I 

I c ollecte d in monitori ng is generally useless in ascertaining system, such as the one shown in FIG. 1 or a more 

I the true nature of the performance difficulty. The system and complicated one found in the business environment. Similar 

method of the present invention, however, provide a mecha-i approaches as above with appropriate modifications can be 

nism whereby system monitoring data is made easily acces-1 used to determine how the various interconnected compo- 

sible _and usable for analyzing current performance a nd -50 nents influence each other, uncovering complex relations 

predicting future per formance. Ine present invention tacili- ' that exist throughout the IT system, 
tates this analysis through use of data mining principles As discussed, multiple applications will be operated 

discussed further hereinafter. within a common IT infrastructure, such as the one shown in 

In general, data mining is an analysis of data, such as in FIG. 1. Often, these applications will utilize some of the 

a database, using tools which d etermine trends or pattern s of 55 same resources. It is obvious that the sharing of IT infra - 

e vent occurrenc es without knowledge of the meaning of the structure resources among different applications may cause 

analyzed data. Such analysis may reveal strategic informa- unexpected interactions on system behavior, and that often 

lion that is hidden in vast amounts of data stored in a such unexpected interactions, being non-synergistic, are 

database. Typically, data mining is used when the quantity of undesirable. An example would be multiple business appli- 

information being analyzed is very large, when variables of 60 cations sharing a router within an IT system. As illustrated, 

interest are influenced by complicated relations to other a particular application, e.g., an E-mail service, burdens a 

variables, when the importance of a given variable varies router in such a way that other applications do not function 

with its own value, or when the importance of variables vary well. In this example, it is reasonable to expect numerous 

with respect to time. In situations such as these, traditional applications to, at times, share usage of the router. Tradi- 

statistical analysis technique s and common database man- 65 tional systems management techniques may prove difficult 

agement systems may fail or become unduly cumbersome, in determining which specific application is causing loss of 

such as may occur when analyzing an IT system. system performance. This example further explains why 
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there is a need to find hidden relationships among IT system in a specific area of technology and stored in an easily 

components and applications running in such environments. retrievable media. This way, persons less skilled than the 

By way of solving the problem in this example, it may be experts, whose knowledge was accumulated within the 

necessary to reroute E-mail traffic through another router to expert system, have access to such expert information. In 

obtain adequate performance for the other applications. 5 this manner, a company may save human and financial 

Traditional IT system management is now generally resources by having less skilled personnel access such 

defined as including all the tasks that have to be performed expert systems instead of requiring the expert to handle all 

to ensure the capability of the IT infrastructure of an of such situations requiring a certain level of knowledge, 
organization to meet user requirements. Shown in FIG. 2 is Utilization of such expert systems allows less skilled 

a traditional IT systems management model, generally des- 10 persons to also analyze IT systems behavior. These systems 

ignated by the reference numeral 200. Essentially, there are may be used to aid in troubleshooting faults in an IT system 

groups of system administrators 210 having knowledge of or they may be used to assist in predicting such faults with 

the IT infrastructure, sucb as the one shown in FIG. 1 and the assistance of system performance monitors, ie., a person 

generally designated herein by the reference numeral 220, with access to an expert system applied to a particular IT 

which they are managing. Typically, the knowledge of the system may, through appropriate monitors, study system 

infrastructure 220 is scattered among the various personnel 15 load parameters or the like and through the use of the expert 

making up the system administrator group 210. The total of system, make estimates of potential faults due to system 

this knowledge is limited to the sum of the individual bottlenecks or the like. 

administrators' knowledge, where invariably there is a great Asignificant drawback of expert systems, however, is that 

deal of redundancy of knowledge. This redundancy may be they are poorly equipped to handle newly encountered 

considered an inefficiency of the overall knowledge base. In 20 problems or situations. In this manner, it is clear that expert 

other words, a theoretical maximum knowledge of the systems are limited in their technical capability of resolving 

infrastructure 220 would be realized only when each indi- novel issues. Instead, expert systems require a complete 

vidual administrator of the administration group 210 had model of all the events or failures that can occur in the 

knowledge that was unique to that specific administrator. system being modeled. 

While this may appear to be an ambiguous analysis of the 25 The present invention is a further progression on the 
effectiveness of the group, it is of real consequence for the aforedescribed conventional art. In a manner similar to the 
company that must finance a group of administrators. way in which data mining techniques are applied to predict 
Furthermore, this knowledge is typically not stored in an the behavior of, for instance, the customers in the direct 
easily retrievable electronic form. marketing example, the idea of such techniques may like- 
When system monitorin g is included in the aforemen- ^ wise be applied to complex IT systems in determining and 
tioned traditional management system, this monitori ng is predicting the behavior of IT components.. The system and 
usually limited to real time da ta, such as the current system method of the present invention, when implemented, facili- 
load and the like. An administrator may observ e such tate the determination of how the interlinked components 
reporting of r eal time d ata, and if system l oads or events influence each other in terms of performance, potentially 
being monitore d are f ound to be consis tent with loads that 35 uncovering unexpected relations among different compo- 
the administrator recognizes to be associated with impend- nents of an IT system. This is accomplished using a conti- 
ing system malfunction or loss of performan ce, that admin- nuity analysis performed in conjunction with the aforemen- 
istrator may redirect part of the load through alternative tioned data mining techniques on historical IT system and 
subsystems of the IT infrastructure to avert problems. subsystems state and simulation test data. 

Often, such r eal lime data reporting may be used in 40 It is clear that with today's increasingly interconnected 

coordination with a system model of the IT system, of which and complex IT infrastructures and the corresponding 

data is being c ollecte d and r eported . The model usually increases in maintenance costs of such systems, a system 

includes a computer algorithm that utilizes code governing and method for discovering deleterious relationships 

the relations among various system devices. A problem with between various subsystems and elements of such complex 

such models, however, is that the relations used in modeling 45 networks in a substantially automated manner is certainly a 

the system account only for expected interactions among valuable tool. 

components and subsystems. The model is, therefore, It is also an object of the present invention to have an 

merely an idealized model of the actual system. Hidden or automated means of accumulating the assortment of data 

unexpected relations that exist between components would that may be analyzed by an appropriate data mining 

not be accounted for. Furthermore, as the infrastructure 220 50 technique, such that performance models of complex IT 

is modified, the model must be manually altered to include systems based on periodic measurements of predefined 

new relations in the model algorithm to account for the performance levels may be generated or updated. Additional 

changes made. description on data mining techniques applied in the context 

An improvement over this traditional management system of the present disclosure may be found in Applicants' 

is realized in the so-called expert system. An expert system 55 co-pending patent application, US. patent application Ser. 

is a form of artificial intelligence in which a computer No. 09/036,394, entitled "System and Method for Model 

program containing a database, frequently referred to as a Mining Complex Information Technology Systems", filed 

knowledge base, and a number of algorithms used to concurrently herewith, which is incorporated herein by 

extrapolate facts from the programmed knowledge and new reference. 

data that is input into the system. The knowledge base is a 60 Another desirable feature of an IT system, such as one 
compilation of human expertise used to aid in solving incorporating the improvements of the present invention, is 
problems, e.g., in medical diagnosis. The utility of the expert to reduce the amount of human intervention required for the 
system is, however, limited to the quality of the data and system to adapt to dynamic system changes. This is prefer- 
algorithms that are input into the system by the human ably accomplished through automation, 
expert. 65 It is further desired that the system and method of the 
Typically, expert systems are developed so that knowl- present invention analyze system performances with Bool- 
edge may be accumulated from a person or persons skilled can attributes, i.e., true or false. 
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SUMMARY OF THE INVENTION 

The present iovention is directed to a system and method 
for automatically creating performance models of an infor- 
mation technology (IT) system by use of a continuity 
analysis, preferably in conjunction with data mining tech-'' 
niques. Adaptive system management is denned as the 
realization of proactive system management with adaptive 
techniques that automatically create models of the system' 
and that can learn to plan and predict the effects of man- 



FIG. 6 is a second sample output decision tree utilizing 
other system attributes. 

DETAILED DESCRIPTION OF THE 
PRESENTLY PREFERRED EXEMPLARY 
EMBODIMENTS 

The present invention will now be described more fully 
hereinafter with reference to the accompanying drawings, in 
which preferred embodiments of the invention are shown. 



agement actions in order to meet the various user require-^ 10 Tnis invention may, however, be embodied in many different 
mcnts. IT Service Level Agreements (SLAs), or perfor- forms aQd not •» construed as limited to the embodi- 

- - - ments set forth herein; rather, these embodiments are pro- 

vided so that this disclosure will be thorough and complete, 
and will fully convey the scope of the invention to those 
15 skilled in the art. 

FIG. 3 shows a model of an adaptive system management 
scenario 300 in accordance with the system and method of I 
the present invention. The application of data mining on an I - 
information technology (IT) system, such as the one shown 
20 in FIG. 1 and generally designated by the reference numeral 
305, is illustrated in FIG. 3, in which the IT system 100/305 
-_is connected to at least one monitor 310 which monitors the - 
performance of the IT system 305. The monitor 310 is 
connected to a historical database 315, which is used to store 
-various performance measurements on the IT system 100. - 
The historical database 315, in turn, is connected to a 
-number of learning algorithms 320. Elements or events— 
. . ' relating to the IT system or infrastructure 305 arc monitored 
A test program is then executed, with execution being ^ systero by appropriate monitoring schemes / 

synchronized with relative momtonng activity, to simulate so j^J^^ the moni tors 310. 

specific IT system actions related to a specific predefined ~, L r* ■ j •.••<• jju 
„V* i- .• c .u . . 1* ,u L • Data from the aforementioned momtonng is forwarded by 

SLA. Execution of the test programs, and the momtonng . >, in , . t . , . . . t .* . , t , 

- - & - - - the monitors 310 and input into the historical database 315. 

—The data within the historical database 315, including the— 

newly updated information on the IT system 305 perfor- 



mance requirements, are predefined constraints or thresholds 
placed on the system. Performance monitoring of the systein 
is then implemented, from which databases of system state 
information are determined and stored. 

A continuity analysis is then performed on the IT system 
or subsystem thereof by synchronizing SLA performance 
simulations with system monitoring activity, and accumu- 
lating both in a historical database. A model of the system 
environment is then used as input for the continuity analysis. 
The environment may be defined with any level of detail and 
is not necessarily a complete or consistent model of the 
actual system. The system and method of the present inven- 
tion is preferably implemented with a collection of data "25 
monitors placed throughout the system. These monitors 
periodically check the state of various elements of the 
system, storing the monitored data in a database. 



activities, are preferably performed automatically and at 
fixed intervals of time. Results of the test program are time 
measurements of the SLA-related actions, which are pref- 
erably expressed as real numbers, and which are stored in a 
database with a time stamp and corresponding monitored 
system data or equivalently, in an array type data storage 
scheme. Additional input includes the SLAs themselves. 
These thresholds are used to convert the real numbers from 40 
the test program into Boolean values, these Boolean values- 
indicating whether or not the predefined threshold was- 
1 exceeded or not This Boolean information is then output to-'' 

v characterize the influence of the various monitor values on 
\\i ' the targeted performance variable, or the SLA. This infor- 45 
s , mation may then be used in a number of ways, including 
trend analysis, performance optimization, and monitor opti- 
mization. 



BRIEF DESCRIPTION OF THE DRAWINGS 

A more complete understanding of the system and method 
of the present invention may be had by reference to the 
following,* detailed description when taken in conjunction 
with the accompanying drawings wherein: 

FIG. 1 is an exemplary network system upon which the 
system and method of the present invention may be 
employed; 

FIG. 2 is a block diagram of a traditional FT systems 
management method; 

FIG. 3 is a block diagram of a system and method for 
adaptive system management in accordance with the present 
invention; 

FIG. 4 is a sample output decision tree using several 
systems attributes; 

FIG. 5 is a scatter diagram of access time attributes for a 
conventional system; and 



ma nee is then subjected to specific learning algorithms 320. 
The learning algorithms 320 may recognize new patterns or 
relationships between discrete events occurring in the IT 
system 305. The learning algorithms 305 then update an 
adaptive model of the IT infrastructure, generally designated 
herein by the reference numeral 325. 

The management environment stores all collected infor- 
mation and uses various learning techniques to leam about 
the IT system 305 being managed. It should be understood 
that the aforementioned learning algorithms 320 are well- 
known to those skilled in the art. These learning techniques 
enable the management environment to better adapt itself to 
the IT infrastructure 305 being managed. Accordingly," once 
additional information becomes available about the IT infra - 
so structure 305, better management of the system environment 
_-is possible. Further information will then be collected and 
stored so that the learning process continues. In fact, the 
entire monitoring, learning, and adapting process provided 
by the system and method of the present invention is 
55 continuous and iterative. 

In devising such a dynamic learning model as disclosed in 
the present invention, it is first necessary to define thresholds 
for various system performances. These thresholds are here- 
inafter referred to as service level agreements or SLAs, 
60 which in the present invention are simply a numerical 
threshold used to evaluate a particular performance level of 
any number of system components or elements. The SLAs 
serve to convert numerical formatted data that is monitored 
into Boolean values indicating whether the SLA threshold 
6 5 was met or not. 

As an example of such an SLA, reference is now made to 
a database 105 in FIG. 1 which is resident on a system server 



LSLA-) 
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110 such that numerous and diverse users may query the 
database 105. In querying database 105, it is reasonable that 
a login must be first performed through the server 110. This 
login, however, may conventionally be performed through 
another server and database, which is shown in FIG. 1 as 
server 120 and database 115, respectively. Therefore, for a 
system user to remotely query database 105, login is first 
executed through database 115, which upon a successful 
login grants the user rights to query database 105. For this 
entire operation, a performance threshold may be estab- 
lished by knowledgeable management personnel, designated 
in FIG. 2 by the reference numeral 230. Typically, such a 
threshold would be formed with knowledge of the server 110 
and 120 performances on which databases 105 and 115, 
respectively, reside and a general knowledge of data traffic 
through these servers. 

For this example, assume the startup time of database 105 
is a reasonable measure of the performance of database 105. 
Therefore, the targeted performance level' of database 105, 
i.e., its SLA, could be constructed from the access times of 
both databases 105 and 115. Here, the SLA may be delin- 
eated as SLA A where A represents database 105. Since 
access time has been assumed to be a good measure of 
performance of such an application, total access time for 
database 105 includes the access time of database 115 since 
effective execution of database 105 is prolonged by the 
execution of database 115 which is also referred to herein by 
the reference indicator B. For this case, the total access time, 
AT A£n for the startup of database 105 may be found from the 
sum of the startup times of the individual databases, AT A and 
AT B , in other words, 

Assume that the study of the individual applications and 
hardware from which execution of these applications are 
executed indicates that it is reasonable for the execution of 
database 115 to take place in no more than 1 second and 
subsequent execution of database 105 in no more than 2 
seconds. From this information, the target for total startup 
time of database 105, AT^ would be for the execution of 
database 105 in no longer than 3 seconds. This threshold for 
execution of database 105, including the required access 
time of database 115, could then be defined for the SLA of 
database 105, hereinafter designated as SLA^. This SLA 
would appropriately be recorded as: 

SLA^gSi secondt 

This SLA would indicate, in a Boolean format, that execu- 
tion of database 105 in a time of less than or equal to 3 
seconds is satisfactory, e.g., a logical one, and an execution 
time exceeding 3 seconds is unsatisfactory, e.g., a logical 
zero. Alternatively, individual thresholds may be defined for 
databases 105 and 115 and a threshold for overall perfor- 
mance of database 105 obtained by simply summing the 
individual thresholds, as follows: 

SLA* £2 seconds 
SLA fi ^l seconds 
Sl^\, fl §3 seconds 

In defining such thresholds, it should be apparent that the 
greater the number of SLAs and monitors 310, shown in 
FIG. 3, monitoring the IT system 100, shown in FIG. 1, the 
better the system may be evaluated. Ideally, the majority of 
IT system 100 components would have SLAs associated 



05/08/2003, EAST 



.1,175 Bl 

8 

with them. Realistically, however, extensive system moni- 
toring presents logistical problems, generally resulting in 
simpler rather than more complicated models. Nonetheless, 
as is apparent to those skilled in the art, the greater the 

5 number of SLAs that may be defined and implemented 
within the IT system 100, the greater the accuracy of the 
system model and technique of the present invention in 
monitoring system performance. 
In order to apply the aforementioned data mining tecb- 

10 niques and learning algorithms to historical data on the IT 
system 100, it is first necessary to build the aforementioned 
historical database 315, as shown in FIG. 3. It has been 
determined that the most advantageous method of storing 
such data is in a conventional relational database format. 

15 Typically, all monitored data from the monitors 310 arc 
directed to one central storage location, i.e., the historical 
database 315. It should be understood, however, that each 
monitor 310 may have its own local memory 330 for storing 
the monitoring data temporarily, e.g., over a minute, hour, 

20 etc., and then later sent to the central historical database 315 
where the aforementioned data mining applications may be 
used to analyze the data. 

It should be understood that the data monitors 310 may be 
placed throughout the IT infrastructure 100/305 at various 

25 components within the system. Monitoring activity may be 
directed to any number of components, applications or other 
resources with, in general, the overall effectiveness of the 
present invention enhanced with a corresponding increase in 
the number of monitors 310 being utilized. These monitors 

30 310 preferably perform their specific monitoring activity 
automatically and at specific time intervals, collecting data 
periodically, e.g., once every minute, ten minutes, hour, etc. 
The type of data being monitored and stored in the historical 
database 315 may be generally described as state or usage 

35 information on a component level, e.g., a harddisk, database, 
server or other network segment such as the components 
shown in FIG. 1. For instance, a monitor 310 used to monitor 
and record historical data on a particular harddisk may 
record the free capacity of the disk and whether the disk is 

40 being accessed or not. Similar data collected from monitor- 1 
ing a database may include the number of users accessing [ 
the database, query volume, and access time. 

In order to perform the continuity analysis on the system 
100, it is necessary to evaluate specific system functions 

45 over set and defined intervals. For this reason, test programs 
are utilized to evaluate whether the system 100 is perform- 
ing within one or more of tbc aforementioned SLAs. For 
'example, it would not be effective to measure and evaluate 
a specific action against its SLA only when that action is . 

so taken by a person on the network. Such actions would most 
likely occur pseudo-randomly and would, therefore, not give 
good indications of the overall performance of the system 
100 with respect to time. 
To evaluate the system more effectively, test programs are 

55 used to simulate those functions that have associated SLAs. 
In utilizing test programs at defined moments in time, 
continuity analyses may be performed on the test and 
monitored data as functions of time. For example, in the case 
of the SLA used for the startup times of databases 105 and 

60 115, a test program would be set up on the server side of the 
network 100 to simulate a query to these databases. This test - 
program would preferably be executed automatically and at: 
fixed intervals of time. Furthermore, this test program would 
be substantially synchronized with monitoring events" 

65 related to the evaluation of the corresponding SLAs. 

For the example of SLA^ as previously defined, a test 
program to simulate the startup of database 105, with the 
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inclusive startup of database 115, is required. It should be ing data effectively shares the time stamp with the test 
understood that the test program may be executed on the program results. 

server or client side, the preference being to have the test A final input is an original system model, upon which the 
program executed on both sides. Executing the test program system and method of the present invention builds, improv- 
on both the client and server side, however, requires separate 5 m S lhe accuracy and performance of the underlying system 
SLAs on both system sides. For simplicity of discussion, «» ia FIG - J- ? *? ^erstood that the 

consideration will only be given to server^ide evaluation modelofthc ITsystem 100 is preferably ^developed such that 
, • „ f ,„, TT^„fx„ „ t„r„ m „« m c,\™.ut;™ ;o it supports the functions for which SLAs are defined. It 

hereinafter. J nerelore, a test program or simulation is per- , fp\ . .... . . . . 

r . „„„„ . ^ „ „,,„„, u t .u.c^ should also be understood, however, that the model may be 

formed on the server side that simulates a query on database , _ , , , , . ! ' ,l,,u „ /ik- 

. . . . ... <- tC . Zt • tu- n defined at any level and it is not necessary that the model be 

105. In doing so database 115 must firs, axept a login. This 10 ktc of ^^casc for C J n tems . ^ 

login is mc uded in the simulation. Hie test program fa ^ ^ tQ ^ ilerative adaptiveness of ^ overa U system 
executes the login and database query, recording the startup and memod Qf ^ m mvemion ^ ±it over Ume the 
time of database 115 and database 105. These startup times model automatically refines and corrects itself, 
recorded from the test program are generally numerical in wilh tbc discussed inputs considered, the output of the 

1 nature, and are subsequently converted to Boolean values 15 sys t e m and method of the present invention may now be 
through the aforedescribed comparisons to the associated- considered. Once sufficient historical data has been collected 
SLAs. For this example, assume that on execution of the test - and stored in the database 315, data mining techniques 
program for a query to database 105, startup time for familiar to those skilled interrogatory in the art may be 
database 115 was recorded to be 1.25 seconds while subse- applied to this collection of monitored data and its associ- 
quent startup time of database 105 was recorded to be 1.5 20 ated test data. Data mining techniques are then applied to 
seconds. The access limes, AT, of both would be recorded these data and the various relations between the monitored 
similar to that given below: system slate data and the data on test performance success 

or failure are uncovered. These newly discovered relations 
ATs-i.25 are then used to update the existing IT model, thereby 

25 rendering the model adaptive. This unique feature of lhe 
A ~ ' present invention, i.e., its ability to adapt itself to the system 

The total startup Ume of database 105 including the prereq- * « uscd t0 monitor and modc1 ' eDablcs thc modcl 
uisite startup time of database 115 is simply the sum of the to be incomplete or mconsistent. 

two startup times, i.e., AT^-2.75 seconds. A decision Iree. algorithm is preferably utilized in the 

The associated SLAs, previously defined, are again given 30 out P ut where lhe value evaluated from the test 

below; program dala and the corresponding SLA is used as lhe 

target attribute of the decision tree. Although decision tree 
SLA^2 seconds induction methods are well known to those skilled in the art, 

FIG. 4 is provided herein to illustrate its usage. In operation, 
SLAfl^i seconds 35 a targeted system component is selected, either by an 

administrator or autonomously, for analysis, and a decision 

SlA^B-SLAa+SLAaSJ seconds _ , ™. / . r , 

A 0 tree 400 generated. This target component forms a root node 

Failure to meet the requirements of an SLA may be assigned 405 of decision tree 400. 

a Boolean low, i.e., False or logical zero, and performances The specific example illustrated in FIG. 4 shows a deci- 

meeting the pertinent SLA being assigned a Boolean high, «> skm tree 400 for a query to the aforementioned database 

i.e., True or logical one. The numerical results of the test U5(B) of FIG. 1, where the performance of the query 

program may then be converted to Boolean attributes by (QUERY_B) through the system 100 is targeted for analy- 

comparisons to their respective associated SLA thresholds. The 50% noted ai the target element 405 indicates that 

In doing so. thc test program results of thc current example mis lar g et nas been determined to be satisfied in 50% of the 

would respectively be assigned Boolean values as follows: « instances, i.e., thc target SLA (access Ume less than or equal 

to one second) was satisfied half the Ume. Thc numerical 

Docs_pcrformancc of A m tc 1 SLA A ?-TRUE value following the success percentage, i.e., 800, is simply 

an indication of the number of instances at which state data 

Does perfom«nce_of_B_iiieet_SLA fl ?-FALSE was recorded over a given time period. In other words, at 

Does_pc r f 0 rm«nce_of_AB_meet_si^ /U ,7-TRUE 50 mis root tewl of analysis, in 800 queries of database 115, the 

aforedescribed target SLA of 1 second was mel only half the 

As indicated hereinbefore, A indicates database 105 and B time. 

indicates database 115. The branches of the decision tree 400 from the root node 

These Boolean test program attributes will then be stored, 405, i.e., an upper 410 and a lower 415 branch or element, 

e.g., in logical format, typically along with their numerical 55 also include monitored values and their determined relation 

counterparts, in the aforementioned historical database 315, to the performance success or failure of the target element 

as is understood in thc relational database art. Preferably, the 405. Thc upper clement 410 of the first branch, for instance, 

numerical and Boolean values would each be assigned indicates the effect of thc number of network file server 

separate fields within the database 315, as is also understood (NFS) daemons on the success or failure of the target 

in the relational database art. Associated with these records 60 element 405. Branch 410 indicates that when the number of 

is a clock or lime stamp indicating the position in history at NFS daemons is greater than ten, the target element 405 

which that test data was gathered. This time stamp is (over a sample size of 350) was found to have acceptable 

preferably allocated a separate field for each record or ^performance 90% of the time. The evaluation of whether the \ 

monitoring event in the historical database. Since system target element 405 performance is acceptable is determined I 

monitoring is synchronized with tie execution of the afore- 65 according to methods earlier discussed, specifically the \ 

described test program, system state monitoring data is methods of definition and evaluation of the performance i 

stored concurrently with test program results. This monitor- thresholds or SLAs. 
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The lower branch 415 from the root node 405 indicates a 685.5 system value, the target element 602, i.e., the query 

that when the number of NFS daemons is ten or less, the to database 115, is satisfied 75.9% of the time (over a sample 

target element 405 (over a sample size of 450) has accept- size of 1,229), and node 608 indicates that when the paging 

able performance only 20% of the time. As shown in FIG. space is less than or equal to 685.5, the target element 602 

4, the lower branch 415 is further split into sub-branches 420 5 attribute is satisfied but 24.7% of the time (over a sample 
and 425, denoting additional system attributes concerning size 012,520). One conclusion can already be made from the 
the target element 405. Sub-branch 420 indicates that when decision tree 600, Le., performance improvement can be 
the number of NFS daemons is less than or equal to ten and gained simply by increasing hardware, especially hard disks 
the number of logons to database 115 is greater than four, the and memory, thereby increasing the chance that the 685.5 
performance of the target element 405 /database 115 (over a 10 threshold is met. 

sample size of 20) is acceptable only 1% of the time, clearly Upper branch 604 in FIG. 6 is further divided into two 

demonstrating a system resource problem. The other sub- sub-branches, i.e., an upper 610 and a lower 612 sub-branch, 

branch 425 indicates that when the number of NFS daemons If the central processor (CPU) of one of the servers, such as 

is ten or less and the number of database logins is four or the one servicing gateway database 115, is idle less than 

less, the target element 405 (over a sample size of 430) has 15 63% of its uptime (sub-branch 610), then performance drops 

acceptable performance 40% of the time. to 36.2% (in a sample size of 381). In other words, if the 

Since the Boolean evaluation of the test programs arc CPU becomes more active, system performance suffers 

recorded in the historical database 315 shown in FIG. 3 with accordingly. Conversely, if the CPU idle is greater than or 

associated monitored system state data, and due to the equal to 63%, indicating greater CPU processing capability 

Boolean values of the SLA parameters being used as target 20 (sub-branch 612), system performance markedly increases 

attributes in the decision tree, the decision tree 400 describes to 94% (in a sample size of 848). As above, performance 

the influence of the monitor values, and thus system com- improvement is gained by ensuring processor availability, 

ponent states, on the target attributes. Factors on system e.g., by installing a more powerful processor or additional 

component states that affect system performance the most processors. 

appear close to the root node 405 of the tree 400. This can 25 It should be understood that the previous examples 

be seen in the example depicted in FIG. 4 where the first depicted in FIGS. 4 and 6 are merely hypothetical and 

branch gives obvious indication of the most causal relations intended only to demonstrate the functionality of the present 

effecting performance of the target element 405. invention. Decision trees used in the present invention 

It should be understood, however, that the aforementioned would likely involve a great number of branches and rela- 

dependency relation between the numbers of NFS daemons 30 lions depicted by these branches. Furthermore, it should be 

and database 115 logons has a high association, i.e., the apparent that separate decision trees would exist for each 

aforedescribed samples of the states of the system 100 have individual attribute targeted for evaluation, and that different 

a strong correlation. The results of the decision tree 400 may attributes could be targeted, generating different decision 

provide support for an existing model of the system 100, trees which would offer further insight into system 100 

which has already identified these dependencies, or unearth 35 functionality as demonstrated when comparing FIGS. 4 and 

a new relationship not defined in the model. In this manner 6. 

the system model may be updated and refined to better It should further be understood that trend analysis may be 

describe the behavior of the system 100. Further description performed to predict potential system failures at one or more 

on the use of the aforementioned data mining principles in target components at a future date. In particular, regression 

a model mining context is found in Applicants' aforemen- 40 analysis can be performed on the parameters close to the root 

tioned co-pending patent application. node, e.g., 405 or 602, to predict whether or not the system 

As another example of the use of the aforedescribed ■ component- will remain in a "bad" branch of the decision 

decision trees, shown in FIG. 5 is a scatter diagram illus- tree, i.e., the component consistently undeiperforins. It 

tration of monitored values within the IT system 100 over should also be understood that conventional regression 

time, particularly, the system access times to database 115. 45 analysis may be employed in performing these predictions, 

As is apparent from the diagram, although performance was e.g., by utilizing a least-squares method to calculate' a 

good initially (most values at one second), over several straight (or other) line that best fits the available data, such 

weeks performance slowly decreased with most access times as the nodal parameters in the decision tree. Future system 

increasing to two, three and even four. Thus, the associated performance of targeted components may then be extrapo- 

SLA for accessing database 115 is increasingly not met and so lated and the requisite predictions made, 
an analysis of system performance is necessary to ascertain One problem with the above scheme, however, is attribute 

the source(s) of the problem. overshadowing by other attributes. Overshadowing occurs 

With reference now to FIG. 6, there is shown another when different attributes would cause a similar split for the 

decision tree 600 which is used in reviewing the impacts of target attribute (the query to database US). The better 

various system attributes and determining the overall per- 55 attribute, i.e., the one better describing the nature of the 

formance or "health" of the aforedescribed system 100, such target, would appear in the decision tree as taking away the 

as one exhibiting the performance problems shown in FIG. effect of splitting on the similar attribute. This occurrence 

5. With reference to the decision tree 600, it is apparent that could, merefore, omit attributes from the decision tree that 
the most important attribute in this IT system 100 for a query may be very indicative of the health of the overall system 
to database 115/B (root node 602) is the amount of paging 60 100, such attributes being overshadowed by the locally 
space available, an indirectly influenced attribute. Queries to better attribute. In an effort to avoid the effects of attribute 
database 115 (in a sample size of 3,749) resulted in a 41.5% overshadowing, an attribute list may be constructed which 
success rate in this underperforming system 100. identifies those attributes exhibiting the best indications for 

The branches of decision tree 600 from the root node 602, the health of the system. Such an attribute list may be 

i.e., an upper 604 and a lower 608 branch or element, further 65 forming by repeatedly constructing a decision tree of depth 

define Boolean attributes for the paging space. For example, 1 and putting the first attribute of the tree into the attribute 

node 604 indicates that when the paging space is greater than list, simultaneously removing that inserted attribute from an 
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input attributes list. In other words and with reference to 
FIGS. 4 and 6, the attributes for paging-space and NFS 
daemons would be included in the list, along with any other 
correlated attributes. 

As discussed, a number of benefits can be realized with 
the generation of the aforementioned decision trees, e.g., 
trend analysis for predicting future system failures and 
performing preventive maintenance. Performance optimiza- 
tion is readily apparent in reviewing the output of the 
decision trees, e.g., the increase in memory and daemon 
resources. It should be understood that since parameters 
close to the root of the decision tree generally have the 
greatest influence on performance, different actions might be 
suggested to optimally influence those parameters. Monitor 
310 optimization is another benefit that may be realized 
from the implementation of the principles of the present 
invention. Based on an analysis of the tree decisions, certain 
monitors 310 may be more or less relevant than other 
monitors with respect to a particular SLA. The positions of 
these monitors 310, or the monitor's frequency of data 
capture, could then be adjusted accordingly to facilitate a 
better analysis of the system 100. 

With the functionality of the present invention having 
now been described, additional understanding may be had 
with further reference to the system 100 shown in FIG. 1, in 
which the present invention may be employed. 

Id devising a proper monitoring scheme for querying 
database 105 or 115, it is apparent that monitors 310 taking 
system state information would be desired at least at user 
workstations 140 and 145, at which the queries may be 
made, a network hub 135, and the afore described servers 
110 and 120. State information would be desired at a 
minimum of these locations since all are directly involved in 
the path of required communication. With monitors placed 
at the aforementioned locations, it would be possible to 
define SLAs for both client- and server-side performance. 

Furthermore, since one of the objects of the present 
invention is to uncover hidden or unexpected relations, a 
monitor 310 may also be placed at a printer 155, servicing 
the workstations 140 and 145, and synchronized with the test 
program of the SLA for querying database 115 or 105. 
Although it would not typically be expected for printer 155 
to have any relation with the performance of workstation 
140 or 145 users querying databases 115 or 105, the printer 
155 is physically coupled to workstations 140 and 145, 
which themselves are coupled through the network 100 to 
the servers 110 and 120, as well as another server 160 and 
potentially many more components via the network hub 135. 
Such coupling can be seen to be a minimum requirement for 
functional interaction between various network 100 ele- 
ments. Additionally, assume a network printer 165, servicing 
the network 100, is only online during certain hours of the 
day. During the hours in which the network printer 165 is 
online, it would be desirable to monitor state information of 
this printer to evaluate of the SLA related to querying 
databases 105 or 115. 

Although the aforedescribed SLAs were defined with 
respect to the server side, inspection of FIG. 1 indicates why 
it is desirable that separate SLAs and corresponding test 
programs be defined additionally on the client side. For a 
client-side SLA, for instance an SLA for querying database 
105 with the performance threshold defined as that time 
measured for startup of database 105 from initial user query, 
it is seen that these SLAs would not be identical. For this 
client-side SLA, it would be necessary to account for the 
delay encountered from the client-side workstation, either 
140 or 145, through the hub 135 to the server 110. Since this 
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communication path is not traversed when measuring from 
the server-side, it is reasonable to expect that the threshold 
on the client-side for this case to be slightly larger than the 
server-side threshold. 

5 Furthermore, by having a client-side SLA related to the 
same function as an SLA defined to evaluate a server-side 
function, additional information may be recovered. In this 
example, by taking monitoring data on client-side informa- 
tion and having separate SLAs and separate test programs 

10 defined on the client side, information would be recovered 
that could determine relationships between the specified 
function, the involved servers and workstations, and the 
network hub 135. By defining and operating the test function 
solely server side, the same relations may be found as long 

is as monitoring activity included workstation and bub states, 
but such relations may be determined more quickly by 
including SLAs and associated test programs both server 
and client side. 
Consistent with the ongoing discussion, when all network 

20 100 elements are functioning and monitored, there are SLAs 
defined client side and server side for the example database 
queries within the architecture depicted in FIG. 1. There 
will, therefore, be test programs launched client side and 
server side that simulate these queries from their respective 

25 sides of the network 100. Furthermore, these test programs 
are preferably synchronized with the aforementioned moni- 
toring activities at the above-specified locations, which all 
constitute network 100 elements illustrated in FIG. 1. 
The above, however, is not intended to suggest that, at 

30 execution of each defined SLA test program, state monitor- 
ing is performed at every available monitor 310. For 
instance, when the network printer 165 is taken offline at 
controlled and specified intervals, it is not necessary to take 
state information on this element when any test programs are 

35 executed. Furthermore, there would likely be network ele- 
ments that are identified as physically (or otherwise) 
decoupled from those elements involved in certain func- 
tions. If such decoupled elements are properly identified, 
monitoring activity on these elements would not be neces- 

40 sary in the test program execution. 

Throughout the discussion of the present invention, con- 
sideration has been given to essentially two functions and 
the development of thresholds (SLAs), monitoring activity, 

45 and analysis of such data. It should be apparent, however, 
that the present invention may include even more of such 
functions, with associated test programs, thresholds, asso- 
ciated synchronized element state monitoring, and subse- 
quent analysis and model modification, as is understood by 

5Q one skilled in the art. 

As discussed, further description on additional features of 
the preferred embodiments of the present invention may be 
found in Applicants' co-pending patent application, incor- 
porated herein by reference. 

55 Although a preferred embodiment of the system and 
method the present invention has been illustrated in the 
accompanying Drawings and described in the foregoing 
Detailed Description, it will be understood that the invention 
is not limited to the embodiment disclosed, but is capable of 

60 numerous rearrangements, modifications and substitutions 
without departing from the spirit of the invention asset forth 
and defined by the following claims. 
What is claimed is: 

1. In an information technology system having a multi- 
65 plicity of interconnected nodes, a method for optimizing 
performance monitoring of said system, said method com- 
prising the steps of: 
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(a) performing a continuity analysis on said system; 

(b) automatically generating a plurality of performance 
models of said system based on periodic measurements 
of predefined performance levels; 

(c) continuously monitoring, at a plurality of said nodes, 5 
the performance of said system at the respective plu- 
rality of said nodes; 

(d) collecting, periodically, performance data on said 
system at said respective nodes; lQ 

(e) applying a plurality of data mining techniques to said 
periodically collected system performance data and its 
associated test program data; 

(f) generating a decision tree using said periodically 
collected system performance data, said decision tree 15 
having a multiplicity of decision nodes, each said 
decision node corresponding to a component of said 
system; 

(g) comparing a plurality of relationships within said 
system between said system performance data and said 20 
test program data; 

(h) automatically modifying said steps of continuously 
monitoring and periodically collecting said system per- 
formance data at a plurality of said nodes, whereby said 
autonomous modification iteratively optimizes said 25 
continuous performance monitoring of said system; 
and 

(i) automatically updating an adaptive system model 
according to newly discovered relationships. 3Q 

2. The method according to claim 1, wherein said steps 
(a)-(i) are repeated a plurality of times, whereby said 
continuous performance monitoring of said system is further 
optimized. 

3. The method according to claim 1, further comprising, 35 
prior to said steps of continuous monitoring and periodic 
collecting, the step of: 

generating a test program pursuant to at least one service 
level agreement, said plurality of nodes for continuous 
monitoring and periodic performance data collection $q 
being selected pursuant to said at least one service level 
agreement. 

4. The method according to claim 3, wherein said test 
program targets a target component within said system, said 
target component being selected from the group consisting 45 
of a system hardware resource and a system software 
application. 

5. The method according to claim 4, wherein said target 
component substantially corresponds to a root decision node 

of said decision tree. 50 

6. The method according to claim 4, wherein said target 
component targeted by. said test program is an underpcr- 
forming system component, whereby said step of automatic 
modifying modifies said steps of continuously monitoring 
and periodically collecting said performance data on said 55 
underperforming system component. 

7. The method according to claim 1, wherein said step of 
automatic modifying modifies the periodicity of said con- 
tinuous monitoring and periodic collection of said perfor- 
mance data. 6Q 

8. The method according to claim 7, wherein said peri- 
odicity increases after said modification. 

9. The method according to claim 7, wherein said peri- 
odicity decreases after said modification. 

10. The method according to claim 1, further comprising 6S 
the step of: 

storing said performance data. 
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11. The method according to claim 10, wherein said 
performance data is stored with an associated time stamp. 

12. The method according to claim 10, wherein said 
performance data is stored in a relational database. 

13. The method according to claim 1, wherein said step of 
generating said decision tree comprises decision tree induc- 
tion. 

14. The method according to claim 1, wherein said 
performance data comprises a respective plurality of state 
information at said plurality of said nodes. 

15. The method according to claim 14, wherein said 
performance data further comprises a respective plurality of 
system information at said plurality of said nodes. 

16. The method according to claim 15, wherein said 
system information comprises a plurality of service level 
agreements. 

17. The method according to claim 1, wherein said 
performance data in said step of periodic collecting is 
collected periodically at specific time intervals. 

18. The method according to claim 17, wherein said 
specific time intervals for periodically collecting said per- 
formance data is selected from the group consisting of days, 
hours, minutes and seconds. 

19. The method according to claim 1, wherein said 
performance data periodically collected in said step of 
periodic collecting has a value selected from the group 
consisting of real numbers, integers and Booleans. 

20. The method according to claim 1, wherein said 
performance data periodically collected in said step of 
periodic collecting is converted to a plurality of Boolean 
values. 

21. The method according to claim 20, wherein said 
plurality of Boolean values correspond to a plurality of 
performance threshold conditions. 

22. The method according to claim 21, wherein said 
performance threshold conditions arc predetermined. 

23. The method according to claim 21, wherein said 
performance threshold conditions are variable. 

24. The method according to claim 1, wherein said 
performance data periodically collected in said step of 
periodic collecting is averaged at specific time intervals. 

25. The method according to claim 24, wherein said 
averaged performance data is converted to at least one 
Boolean value. 

26. The method according to claim 25, wherein said at 
least one Boolean value corresponds to at least one service 
level agreement within said system. 

27. The method according to claim 1, wherein in said step 
of generating, regression analysis is performed on at least 
one target component corresponding to a target node of said 
decision tree, whereby the performance of said at least one 
target component is predicted at a future time from a 
plurality of parameter data within a plurality of said decision 
nodes. 

28. An information technology system having a multi- 
plicity of interconnected nodes, said system comprising: 

performance means for performing a continuity analysis 
on said system; 

generating means for automatically generating a plurality 
of performance models of said system based on peri- 
odic measurements of predefined performance levels; 

monitor means, for continuously monitoring, at a plurality 
of said nodes, the performance of said system at the 
respective nodes; 

collection means for periodically collecting, at said plu- 
rality of said nodes, performance data for said system 
at specific time intervals; 
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data mining technique application means for applying a 
plurality of data mining techniques to said collected 
system performance data and its associated test pro- 
gram data; 

decision tree generation means for generating a decision 5 
tree using said periodically collected system perfor- 
mance data, said decision tree having a multiplicity of 
decision nodes, each said decision node corresponding 
to a component of said system; 

comparison means for comparing a plurality of relation- 10 
ships within said system between said system perfor- 
mance data and said test program data; 

modification means for automatically modifying said 
monitor and collection means for the continuous moni- 
toring and periodic collection, respectively, of said J5 
system performance data; and 

updating means for automatically updating an adaptive 
system model according to newly discovered relation- 
ships. 

29. The system according to claim 28, further comprising: ^ 
test program generation means for generating a test pro- 
gram pursuant to at least one service level agreement. 

30. The system according to claim 29, wherein said test 
program targets a target component within said system. 

31. The system according to claim 30, wherein said target 
component substantially corresponds to a root decision node 25 
of said decision tree. 

32. The system according to claim 31, wherein said target 
component is selected from the group consisting of a system 
hardware resource and a system software application. 

33. The method according to claim 31, wherein said target 30 
component substantially corresponds to a root decision node 

of said decision tree. 

34. The method according to claim 33, wherein said target 
component targeted by said test program is an undcrper- 
forming system component, whereby said modification 35 
means automatically modifies the continuous monitoring 
and periodic collecting of said performance data on said 
underperforming system component by said monitor and 
collection means, respectively. 

35. The system according to claim 28, wherein said 40 
modification means automatically modifies the periodicity 

of said performance data continuous monitoring and peri- 
odic collection by said monitor and collection means, 
respectively. 

36. The system according to claim 35, wherein said 45 
periodicity increases after said autonomous modification. 

37. The system according to claim 35, wherein said 
periodicity decreases after said autonomous modification. 

38. The system according to claim 28, further comprising: 
storage means for storing said periodically collected per- 50 

formance data. 

39. The system according to claim 38, wherein said 
performance data is stored with an associated time stamp. 

40. The system according to claim 38, wherein said 
storage means is a relational database. 55 

41. The system according to claim 28, wherein said 
decision tree generation means generates said decision tree 
using decision tree induction. 

42. The system according to claim 28, wherein said 
performance data comprises a respective plurality of state 60 
information at said plurality of said nodes. 

43. The system according to claim 42, wherein said 
performance data further comprises a respective plurality of 
system information at said plurality of said nodes. 

44. The system according to claim 43, wherein said 65 
system information comprises a plurality of service level 
agreements. 



45. The system according to claim 28, wherein said 
collection means periodically collects said performance data 
periodically at specific time intervals. 

46. The system according to claim 45, wherein said 
specific time intervals for periodically collecting said per- 
formance data is selected from the group consisting of days, 
hours, minutes and seconds. 

47. The system according to claim 28, wherein said 
performance data periodically collected by said collection 
means has a value selected from the group consisting of real 
numbers, integers and Booleans. 

48. The system according to claim 28, wherein said 
performance data collected periodically by said collection 
means is converted to a plurality of Boolean values. 

49. The system according to claim 48, wherein said 
plurality of Boolean values correspond to a plurality of 
performance threshold conditions. 

50. The method according to claim 49, wherein said 
performance threshold conditions are predetermined. 

51. The method according to claim 50, wherein said 
performance threshold conditions are variable. 

52. The system according to claim 28, wherein said 
performance data periodically collected by such collection 
means is averaged at specific time intervals. 

53. The system according to claim 52, wherein said 
averaged performance data is converted to at least one 
Boolean value. 

54. The system according to claim 53, wherein said at 
least one Boolean value corresponds to at least one service 
level agreement within said system. 

55. The system according to claim 28, wherein said 
decision tree generation means employs regression analysis 
on at least one component of said decision tree, whereby the 
performance of said at least one target component is pre- 
dicted at a future time from a plurality of parameter data 
within a plurality of said decision nodes. 

56. An article of manufacture comprising a computer 
usable medium having computer readable program code 
means embodied thereon for optimizing performance moni- 
toring of at least one node in an information technology 
system, the computer readable program code means in said 
article of manufacture comprising: 

computer readable program code means for: 

(a) performing a continuity analysis on said system; 

(b) automatically generating a plurality of performance 
models of said system based on periodic measure- 
ments of predefined performance levels; 

(c) continuously monitoring, at a plurality of said 
nodes, the performance of said system at the respec- 
tive plurality of said nodes; 

(d) collecting, periodically, performance data on said 
system at said respective nodes; 

(e) applying a plurality of data mining techniques to 
said periodically collected system performance data 
and its associated test program data; 

(f) generating a decision tree using said periodically 
collected system performance data, said decision tree 
having a multiplicity of decision nodes, each said 
decision node corresponding to a component of said 
system; 
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(g) comparing a plurality of relationships within said 
system between said system performance data and 
said test program data; 

(h) automatically modifying said steps of continuously 
monitoring and periodically collecting said system 
performance data at a plurality of said nodes, 
whereby said autonomous modification iteratively 
optimizes said continuous performance monitoring 
of said system; and 

(i) automatically updating an adaptive system model 
according to newly discovered relationships. 
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57. A program storage device readable by a machine and 
encoding a program of instructions for executing the method 
steps of claim 1. 

58. The method according to claim 1, wherein said newly 
5 discovered relationships arc uncovered or unexpected rela- 
tions. 

59. The system according to claim 28, wherein said newly 
discovered relationships are uncovered or unexpected rela- 
tions. 

10 
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